Introduction
In many foraging environments the properties of the available food resources change. In order to adjust to and exploit changing resources, an animal can potentially learn and remember those properties. An animal can never learn everything about its environment; any learned information will always be incomplete, no matter how much effort is spent in obtaining it. One might then ask - when does the benefit of the learned information outweigh the cost of gaining it? The value of information lies in whether it can tell a forager something that changes its behaviour (Stephens, 2007). When a forager’s behaviour allows it to experience environmental change, and so gain information about the current state of the world, this might be termed ‘tracking.’ The information gained can then be translated into appropriate actions (Dunlap and Stephens 2012).
It can easily be seen that the value of seeking information depends on temporal parameters. When an environment changes rapidly any information that is gained by tracking it becomes outdated very soon. When an environment changes so slowly that there is no consequence in the animal’s lifetime, any effort spent in tracking would not yield usable information. Furthermore, the benefit of tracking lies in allowing a forager to choose the best of the options available in an environment - for example, the option that results in the highest caloric gain. For certain combinations of environmental rates of change and differences in the quality of the available options, environmental tracking is both possible and beneficial, in the sense of resulting in a higher energetic net yield. Under some other circumstances it may be preferable to adopt a ‘one size fits all’ or averaging approach, where a forager applies one behavioural response that does best on average over all the possible environmental states (Stephens and Dunlap 2008). One might then ask: what sort of environmental change is tracking, rather than applying an averaging response, beneficial?
An early attempt to model such a situation was done by Stephens (1987), attempting to answer the question of whether, and to what extent, a forager should modify its behaviour in response to a change in its environment. In this simple model the environment has a ‘variable’ option and a stable ‘alternative’ option. The latter has a single value, \(v_a\), and the former can vary between a good state, \(v_g\), and a bad state, \(v_b\), such that \(v_g\) > \(v_a\) > \(v_b\). The forager can recognise the type of resource (variable vs alternative) upon encounter, but must consume a resource to know its sub-type (good vs bad). The mechanism through which tracking happens is sampling, i.e., visiting the variable option when the last experience of it was the bad state, with the intention of learning what state it is in at the present time. The probability that the variable option stays the same from one encounter to the next is q. The probability that the variable option stays the same from one encounter to the next is q. A forager can make two kinds of errors, (i.e., choices for the less rewarding option) in this environment: an overrun error if the forager visits the alternative option when the variable is in its good state, and a sampling error if the forager visits the variable when it is in its bad state. The relative cost of these two errors is the ratio \(\epsilon\).
\[\mathrm{\epsilon} = \displaystyle \frac{\rm Cost \;of\; a\; sampling\; error\; }{\rm Cost \;of\; an\; overrun\; error\;} = \frac{\ v_a - v_b }{\ v_g - v_a} \]
Thus the optimal sampling period, i.e., tracking, could be solved for in terms of these two variables. This simple model had several predictions. First, tracking behaviour should decrease with a decrease in \(v_a\). This is because sampling errors become more costly and overrun errors become less costly: \(\epsilon\) increases. When \(v_a~ \ge v_g\), tracking behaviour should stop completely. Second, and conversely, tracking behaviour should increase with \(v_g\), as overrun errors become more costly, and \(\epsilon\) decreases. Third, tracking behaviour should decrease as q decreases, as the states of the variable option become more stable.
These predictions were partially held up by some experimental studies. Hummingbirds were found to decrease their sampling rates as the probability of change of the varying option decreased, as predicted, but did not avoid the variable option when the \(\epsilon\) value increased (Tamm 1987). Similarly, the behaviour of pigeons qualitatively conformed to the predictions of the model, but quantitatively best explained by a model of choice where reward rate is maximized on a moment-to-moment basis based on scalar expectancy (Shettleworth et al. 1988). These experiments manipulated \(\epsilon\) but not q. When q was manipulated in an experiment with blue jays, presented with either a high and a low rate of change, both sampling and learning rates - i.e., tracking - were found to increase at faster rates of environmental change (Dunlap and Stephens 2012). Similarly, bumblebees sampled the variable resource more frequently when the probability of change was high, as predicted, but did not consistently choose the more rewarding option except when the probability of change was low and the potential reward was very high (Dunlap, Papaj, and Dornhaus 2017).
The merit of the Stephens model is that it outlines the minimum theoretical basis of the issue of environment tracking in order to generate quantitative predictions in a real environment. In a real-world context, however, it is instructive to consider the limitations of the model. A very important assumption of the model is that the forager not only knows the values of the parameters q and \(\epsilon\), but also knows the structure of the environment: that the variable option switches between a good and a bad state. A real foraging animal can only have a distribution of values as an estimate for the parameters, and can never know the whole structure of its environment. Indeed, since q is the probability of change at every encounter with the variable option, knowing the current state of the variable option does not say anything about what its state will be at the next encounter.
Another caveat that affects the predictions of the model is that sampling, by definition, should never occur when the forager is exploiting the variable option. When the state of variable option is known, the state of the environment is known, so a subsequent visit to the stable alternative option will not yield any further information. Thus, the model’s predictions only apply to what a forager does when it is at the alternative option. Thus, basic assumption of the model, namely that (a) the forager never visits the stable option when exploiting the variable one in its good state, and (b) that the forager immediately switches to the stable option when the variable option switches to its bad state are not met in any of the systems that have been studied. This is a serious issue because it means that while the conceptual contributions and rationale of the model are still valuable, its quantitative predictions are not valid because its assumptions are not fulfilled. These different kinds of foraging errors are discussed in a study by Commons, Kacelnik, and Shettleworth (2013), which offers a series of models for a similar situation but in which the strategies are based on the observation that assumptions of the Stephens model are not met in real datasets.
In this study, consisting of two experiments, we attempted a empirical implementation of the model to study the tracking behaviour of the nectar-feeding bat Glossophaga mutica (Calahorra-Oliart, Ospina-Garcés, and León-Paniagua 2021). The natural foraging environment of these animals consists of mainly of flowers that contain varying levels of nectar. From the point of view of an individual bat that encounters a flower, the nectar levels change constantly: increasing gradually according to the flower’s nectar secretion rate and decreasing according to how many competitors are present in the environment. Bats must constantly compare flowers in different states: full, partially full, or empty.
In our experiments we placed the bats in an environment containing exactly two ‘flowers’: a flower that always yielded the same volume of reward - a fixed option - and a flower that yielded a reward whose volume changed as a sine function of time, increasing and decreasing. We termed the latter a ‘fluctuating’ option instead of a ‘variable’ option, to differentiate it from an option that could only be in two states, good and bad. While most previous empirical tests of tracking models manipulated either the rate of environmental change (q) or the relative cost of the two kinds of errors (\(\epsilon\)), we varied the equivalents of both parameters systematically.
The average relative cost of sampling the two options was determined by the volume of the fixed option. An additional factor is that behaviour may not be driven directly by the absolute real values, but by how they are perceived, and it may be useful to take into account how perception works. In many foraging situations, animals discriminate between relevant variables such as reward magnitudes and time costs according to Weber’s Law, that states that the just-noticeable difference to a stimulus is proportional to the magnitude of the stimulus (Fechner (1860); see Kacelnik and Brito e Abreu (1998) for its application to foraging). In our first experiment the fixed option yielded a reward at the arithmetic mean of the maximum and minimum volumes of the fluctuating option. In the second, the fixed output was smaller than the arithmetic mean. By fixing it at the geometric mean of the variable extremes, we aimed at making the fixed volume equally discriminable from the minimum and maximum values of the fluctuating option, that is, we fixed it at the variable option’s ‘subjective’ mean.
The environmental rate of change in our experiment was determined by the period of the sine function governing the fluctuating output: the smaller the period, the faster the change. In both experiments the bats experienced the same four periods. It is important to note that in this study the rate of environmental change does not correspond exactly to q, as the fluctuating option changes, not probabilistically, but systematically. From the point of view of the bats the reward on an encounter with the fluctuating option changes from the last encounter when it is discriminably different. The lower the period, the more likely it is that the the fluctuating output is different for a given encounter rate, and so is an equivalent of the parameter q.
Stephens’ model applies to a situation where a foraging agent that is perfectly informed about its environment would follow the model’s predictions. This is because system described by the model is intrinsically stochastic: it behaves according to some probability of changing state. Therefore, even an ideal forager would show errors in its behaviour in such a system. In our experiment however the system is deterministic, so an ideal agent would in fact behave optimally without any error at all, allocating its behaviour entirely to whichever option was yielding a higher reward at any point in time. A realistic agent on the other hand, does not know everything, even in a deterministic scenario. From the point of view of a real agent, the system does behaves as if it were stochastic. For these reasons, our experiment is inspired by Stephens’ model but not designed to test it. It is an empirical study aimed at understanding how and if bats exploit fluctuations in their environment.
We redefined tracking behaviour in our experiment as an outcome, along the lines of Dunlap, Papaj, and Dornhaus (2017): allocating choice behaviour by matching the option yielding the larger reward at time of each choice (see Figure whatever - I will insert an explanatory figure later in the Methods where it is appropriate). This is in contrast to the original mathematical model and some previous studies which put tracking in terms of sampling as its mechanism. A closer match between an animal’s choice behaviour and the state of the environment meant that the animal was tracking better: a perfectly tracking bat would always choose the fluctuating
output when it was larger than the fixed, and choose the fixed when it was larger than the fluctuating.
We predicted that tracking would be better when a) the period of the sine function was larger, i.e., the environment was changing more slowly and b) when the contrast between the fixed and fluctuating options was higher. The latter condition was satisfied, not whent the fixed output was the arithmetic mean, but when it was the subjective mean. By definition the subjective mean was equally discriminable from the best and worst fluctuating outputs, and so the arithmetic mean was less discriminable from the best fluctuating output than from the worst. We referred to the experiment where the fixed option was the subjective mean as the ‘high contrast’ experiment and where the fixed option was the objective mean as the ‘low contrast’ experiment.
We also investigated how much the bats had learned the structure of their environment. We did not expect the bats to learn the complex rule of the environment, i.e., that fluctuating output varied sinusoidally. Instead, we thought it was possible for the bats to detect an increasing or decreasing trend in the fluctuating output and for this to influence their choice behaviour. Thus we compared the choice for fluctuating volumes when these volumes were part of a downward trend, to the same volumes when they were part of an upward trend.
Materials and Methods
Subjects and housing
Both experiments were done at the Cognitive Neurobiology Lab at the Humboldt Universität zu Berlin: the high contrast experiment in December, 2019; the low contrast experiment in June and July, 2020. The experiments were performed with two different sets of individual bats, and were identical in their design and procedure except for the one critical difference of the volume of reward delivered by the fixed option (see Experiment Schedule below).
Bats of the species Glossophaga mutica from a captive colony at the Humboldt Universität were used for the experiment. The colony was a breeding population housed at 18-24\(^\circ\)C and 45-70% humidity on a 12-hour light-dark cycle (light phase: 0200 to 1400 CET; 0300 to 1500 CEST). In this colony every bat older than approximately a year (judged through the ossification of the finger joint - Brunet-Rossinni and Wilkinson, 2009) was assigned a permanent ID number, which shall be referred to from now on in order to distinguish the individuals. The bats that were selected for the experiment were a mix of animals that had previously been exposed to the experimental apparatus, and naive ones. None of the bats had participated in such an experiment, or a similar one, before. 16 animals completed the high contrast experiment: 11 females and 5 males. 18 animals completed the low contrast experiment: 10 females and 8 males.
Experimental Setup
The experimental setup was common to both experiments.
Reward
The reward received by the bats during the experiment was also their main source of food. The reward was a 17 ± 0.2% by weight solution of sugar dissolved in water (prepared fresh everyday or every other day), hereafter referred to as ‘nectar.’ The sugar consisted of a 1:1:1 mass-mixture of glucose (“Traubenzucker,” Müller’s Mühle GmbH, Germany), sucrose (“Zucker,” Belbake, Südzucker AG, Germany) and fructose (“Fruchtzucker,” Hamburger Zuckerhandelsgesellschaft mbH, Germany). The nectar was thus similar in composition and concentration to the nectar produced by wild chiropterophilous plants (Baker, Baker, and Hodges 1998).
Experimental Apparatus
The animals were placed in individual, adjacent cages (0.7 x 2.2 x 1.5 m) for the duration of the experiment. As there were six cages in total the experiment was carried out in batches of six bats at a time, and each individual progressed through the experiment independent of all the others. Each cage had an operant wall with two electronic reward-dispensing devices spaced approximately 30 cm apart, hereafter referred to as ‘flowers’ (figure 1 and figure 2). Each flower had a circular head and a door controlled by a linear-actuator motor that could move up and down. Just inside the head of the flowers was an infra-red light barrier, and at the back of the flower was a Teflon tube that supplied the nectar to the flower(figure 3). Each Teflon tube was connected to a short piece of soft peroxide-silicone tube that ran through a pinch-valve.
The Teflon tubes were connected to a syringe pump in a branching design that ensured the length of tube between every flower and the pump was exactly equal to 470 cm. The pump was placed outside the cages on a shelf, inaccessible to the bats. The syringe of the pump was a Hamilton 25 mL glass syringe (Sigma Aldrich, Germany) and connected to the tubing system of the flowers through five pinch valves on the pump. These pinch valves controlled the flow of liquid from the pump to the system and from a reservoir of liquid to the pump. The reservoir (500 mL thread bottle, Roth, Germany) was filled with fresh nectar every day and connected to the syringe through the valves.
The flowers and the pump were connected by ethernet cables to a laptop computer (ThinkPad, IBM) that stood outside the cages. This computer ran the experimental schedule and the program used to clean and fill the systems using the PhenoSoft Control program (Phenosys, GmBH, Germany). To trigger a reward a bat had to place its nose inside the flower and break the infra-red light barrier. This sent a signal to the computer, which triggered the pinch-valve to open and the pump to move the correct number of steps.
General Experimental Procedure
Data-collection was completely automated and happened for 12 hours every day. The experimental animals were kept on the same light-dark cycle as the bats in the colony and were active during the dark phase, which is when the data were collected. The experiment was prepared everyday in the morning during the light phase. The animals were inspected everyday to make sure they were healthy and flying well. Then a preliminary analysis of the data from the previous night was done everyday on the laptop running the experimental program using a Shiny App written in R, to make sure the program had been executed correctly and the bats had drunk sufficient nectar. The minimum quantity of nectar was an amount that yielded 25 kiloJoules of energy. Any bat that drank less than this amount was given honey water for an hour before the start of the experiment.
The old nectar was flushed from the system using the automated PhenoSoft program and fresh nectar refilled. Twice a week, the pump and tubing system was thoroughly rinsed with 70% ethanol and de-calcified water to remove pathogens.
At approximately 1800 h the data were checked to see if all the bats had made at least two visits to the flowers, and thus learned to trigger rewards. If bats had not made visits, they received ad-libitum honey water for the rest of the experimental night and they were replaced with another animal on the next night.
The bats were given supplemental food in addition to the nectar from the flowers. 0.2 g of a powdered nectar mixture (NEKTAR-Plus, NEKTON, Germany) and 0.3 g of milk powder (Milasan “Folgemilch 2,” Sunval Baby Food, Germany) mixed in approximately 1 mL of water, and 2 mL of plain water were given to each bat. These supplements were put into Eppendorf tubes attached to the operant wall of the cage, about 87 cm below the flowers. The additional food was such that the bats would prefer to visit the flowers instead, both because the flowers were at a more comfortable height for the animals and because the nectar had a higher sugar content and was preferred to the milk powder-nectar supplement mix. The additional food was given firstly to supply micronutrients to the bats while they were in the experiment, and secondly to ensure the animals received a sufficient number of calories in case there was a technical system failure or the bats did not make a sufficient number of visits to the flowers. No technical failures occurred during either experiment.
Once an animal had completed the experiment, it was removed from the cage, weighed to see if it had lost weight since the start of the experiment, released back into the colony and replaced with another bat.
During the experimental night, when the syringe of the pump had been fully emptied, the pump had to refill with nectar from the reservoir. This event happened on average 3.85 times per night (SD = ± 0.26), taking 6.6 minutes each time (SD = ± 1.63). During this time, if the bats made visits to the flowers, they did not receive any reward.
Experiment Schedule
In both experiments, one option was the ‘fixed’ option and the other was the ‘fluctuating’ option. The fluctuating option delivered a reward that varied as a sine function of time, starting at its maximum volume when a bat made its first visit to the fluctuating option, and proceeding through the sine-function regardless of where the bat made its subsequent visits. In the high contrast experiment the reward delivered by the fixed option was selected so that the volume pairs of the fixed option and the minimum output of the sine-wave, and the fixed option and the maximum output of the sine-wave were, in principle, equally discriminable. This was based on the relative intensity of the volume pairs, calculated as follows:
\[\ \displaystyle \frac{\ volume_1 - volume_2 }{\ (volume_1 + volume_2)/2} \]
In the low contrast experiment, the output of the fixed option was the arithmetic mean of the peak and trough volumes of the fluctuating option, and so was less discriminable from the peak than the trough. The maximum volume of the fluctuating option, i.e., the peak of the sine-wave, was 25 \(\mu\)L, and the minimum was 2 \(\mu\)L, so the output of the fixed option was 7 \(\mu\)L in the high contrast experiment and 13.5 \(\mu\)L in the low contrast experiment.
The experiment proceeded through the following stages:
Pre-training
On the first day of the experiment the bats were placed inside the cages and allowed to acclimatize to the new environment. The flowers were covered with a towel to prevent the animals accessing them, and containers of honey water were placed on top of the covered flowers, which the bats found easily. On this day alone no other food was given, not even the supplementary mixture. Food was only available at the location of the flowers. No data were recorded by the computer on this day, and the amount of honey-water consumed was not monitored.
Training
Shortly before 1400 h, the towels were removed from the flowers so the bats could access them. To teach the bats to put their noses into the flower head and trigger the reward, a drop of honey was applied to the back of the flower and a drop to the top of the flower.
The training proceeded in five phases that repeated throughout the night. Whenever the bats completed 50 visits to both flowers in total, the phase ended and the next began.
Initial: The doors in front of the flowers remained open, and the bats could pay a visit to whichever flower they wanted. The bats received a reward volume of 25 \(\mu\)L at both flowers.
Forced 1: This was a phase of forced alternation. At the start of this phase, the door in front of one of the flowers moved up to prevent access to it, forcing the bat to visit the other one. After a visit was made and the reward collected, the door of the visited flower would move up to block access to it, and door of the other flower would open. In this way the bat was forced to alternate its visits to the two flowers and so ensure that the locations of both flowers were learned. In this phase there was a difference in reward volume between the two flowers. Two pairs of volumes were possible: the fixed output and 2 \(\mu\)L; or the fixed output and 25 \(\mu\)L. Depending on which experiment it was, the fixed output was either 7 \(\mu\)L (the subjective mean) or 13.5 \(\mu\)L (the objective mean). Half the bats were given one volume pair, and the other half the other volume pair. The flower on which the higher volume was given was counter-balanced across animals.
Free 1: This was a phase of ad-libitum reward similar to the Initial phase: both flower doors were open so both flowers were freely accessible to the bats. The volume differences of the Forced 1 phase were maintained. As the bats were free to visit both flowers, the preference of the bats for the flower that gave the higher volume was taken as indication of the discriminability of the volumes.
Forced 2: This phase was the same as the Forced 1 phase except the volume pairs were different. Those bats that received the fixed output vs. 2 \(\mu\)L volume pair in the Forced 1 phase now received 25 \(\mu\)L vs. the fixed output and vice versa. Half the bats received the higher volume at the same flower as Forced 1 and the other half at the other flower.
Free 2: This was similar to the Free 1 phase, in that both flowers were accessible and reward was ad-libitum, but the reward volumes at the flowers were the same as those in the phase Forced 2. In this way the bats’ preferences for the higher volume of both volume pairs was determined.
After the bats had completed all five phases, the schedule repeated itself except for the Initial phase. This continued for the rest of the night. If a bat learned to trigger rewards and made visits, but not a sufficient number to experience all five phases at least once it had to repeat the Training stage on the next night. If the bat did not complete all five phases even on the second day of Training it was removed from the experiment and replaced.
Main Experiment
The bats experienced four experimental conditions, corresponding to four periods of the sine wave:
- 0.75 hours
- 1.5 hours
- 3 hours
- 6 hours
The period of the wave was the time interval between two consecutive peaks or troughs. During each experimental night the bats were given free choice between the fixed option and the fluctuating option whose output varied by a sine function of time, calculaed as follows:
\[ \mathrm{y(t)} = {\rm Asin(2\pi ft + \varphi) + D} \]
where:
- A is the Amplitude of the wave, or the distance between the peak and the mid-value of the wave
- f is the frequency of the wave, or the reciprocal of the wave period in seconds
- t is the time point in seconds since the start of the wave
- \(\varphi\) is the Phase, specifying in units of radians where the wave is when t = 0
- D is the Displacement, or a center Amplitude that is not 0
The bats first experienced a condition for a night, during which the fixed and fluctuating options were assigned to a flower location each, and this location did not change. On the following night there was a reversal of options, i.e., a reversal of the reward contingencies of the flowers: the flower that had previously been the fixed option was now the fluctuating, and vice versa. This was done to control for a location preference by the bats. After the bats had experienced a condition on two successive nights in this way, the next condition was given, so there were 4x2 or 8 experimental nights in total (in addition to the training). The order of the conditions was pseudo-randomized across animals.
On the first night of the main experiment the fluctuating option was assigned to the flower that each bat had made more visits to overall on the previous training night, as it was assumed that the animals now had a slight preference for this flower. From then on the reversal of reward contingencies between the two flowers happened every night. At the start of each experimental night, the sine-function that determined the fluctuating output did not begin until the bat made a visit to the fluctuating output. Then the bat experienced the peak of the wave, i.e., the highest possible fluctuating output (25 \(\mu\)L). This was a large reward, and designed to motivate the bats to make repeated visits to the fluctuating option so they could experience the change in the output (see Supplementary Information).
Data analysis - WRITE THIS SECTION ALONE LATER
The raw data from these experiments were logged as events by a computer and recorded in comma-separated value (CSV) files. Each event included the date and time of the event, the animal that made the event, the duration of interruption of the photo-gate and the volume of nectar dispensed. The CSV files were then read into R, which was used for all statistical analyses and creation of plots.
A bat had to experience the reward contingencies of both options on every night to be included for the statistical analysis. In practice this meant that the bat had to make at least one rewarded visit to both options every night.
The bats experienced the reward volumes in the fluctuating options as part of either a downward trend, when the fluctuating output was decreasing, or part of an upward trend, when the fluctuating output was increasing. In both cases the volume difference between the fixed and the fluctuating options was exactly the same but the difference was in the volume differences experienced just before. The bats could use their past experience in one of two ways: they could either estimate an option as being more rewarding based on their reinforcement at that option in the recent past; or they could estimate an option as being more rewarding based on their experience of an increasing reward output at that option, despite the recent past reinforcement being comparatively low. In the first case, we would expect the proportion of visits to any volume of the fluctuating option to be higher when that volume was part of a downward trend; in the second case, we would expect the proportion of visits to be higher when the volume was part of an upward trend. We also considered the specific case of the volume pair 7 vs. 13.5 \(\mu\)L. In the subjective mean experiment this situation arose when the fluctuating output was 13.5 \(\mu\)L because the fixed output was always 7 \(\mu\)L; in the objective mean experiment it arose when the fluctuating output was 7 \(\mu\)L as the fixed output was always 13.5 \(\mu\)L. The volume pair was discriminable by the bats, and if there were no effect of trend, the preference for the higher volume should be higher than 50% in all the experimental conditions.
In both the experiments, we investigated the effect of trend, volume of the fluctuating output and rate of change of the fluctuating option on the proportion of visits to the fluctuating option. In both the experiments we also created separate models of one specific pair of volumes: 7 and 13.5 \(\mu\)L. In the subjective mean experiment this was when the fluctuating output was 13.5 \(\mu\)L, and in the objective mean experiment this was when the fluctuating output was 7 \(\mu\)L. We investigated the effect of trend on the proportion of visits made to the higher volume in this pair (13.5 \(\mu\)L) in both experiments. The proportion of visits to the fluctuating output was calculated as the number of visits to the fluctuating output divided by the sum of the number of visits to the fluctuating output and the number of visits to the fixed option in that category. The proportion of visits to the higher volume of a volume pair was calculated in a similar manner.
Generalized linear mixed-models were fitted in a Bayesian framework using Hamiltonian Monte Carlo in the R package brms (Bürkner 2017), which is a front-end for rstan (Carpenter et al. 2017). The technical details of these models are provided in the Supplementary section. We present plots of the conditional effects of the predictor variables, with the parameter values of the models provided in the Supplementary section. We report the mean as a measure of central tendency and the 89% quantile-based credible intervals for the parameters. (89% boundaries are the default for reporting credible intervals - McElreath (2020)).
All statistical analyses and creation of plots were done in R.